HSEARCH: fast and accurate protein sequence motif search and clustering
نویسندگان
چکیده
Protein motifs are conserved fragments occurred frequently in protein sequences. They have significant functions, such as active site of an enzyme. Search and clustering protein sequence motifs are computational intensive. Most existing methods are not fast enough to analyze large data sets for motif finding or achieve low accuracy for motif clustering. We present a new protein sequence motif finding and clustering algorithm, called HSEARCH. It converts fixed length protein sequences to data points in high dimensional space, and applies locality-sensitive hashing to fast search homologous protein sequences for a motif. HSEARCH is significantly faster than the brute force algorithm for protein motif finding and achieves high accuracy for protein motif clustering.
منابع مشابه
Designing Of Degenerate Primers-Based Polymerase Chain Reaction (PCR) For Amplification Of WD40 Repeat-Containing Proteins Using Local Allignment Search Method
Degenerate primers-based polymerase chain reaction (PCR) are commonly used for isolation of unidentified gene sequences in related organisms. For designing the degenerate primers, we propose the use of local alignment search method for searching the conserved regions long enough to design an acceptable primer pair. To test this method, a WD40 repeat-containing domain protein from Beauveria bass...
متن کاملIn Silico Analysis of Glutaminase from Different Species of Escherichia and Bacillus
Background: Glutaminase (EC 3.5.1.2) catalyzes the hydrolytic degradation of L-glutamine to L-glutamic acid and has been introduced for cancer therapy in recent years. The present study was an in silico analysis of glutaminase to further elucidate its structure and physicochemical properties.Methods: Forty glutaminase protein sequences from different species of Escherichia and Bacillus obtained...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملIn Silico Analysis of Primary Sequence and Tertiary Structure of Lepidium Draba Peroxidase
Peroxidase enzymes are vastly applicable in industry and diagnosiss. Recently, we introduced a new kind of peroxidase gene from Lepidium draba (LDP). According to protein multiple sequence alignment results, LDP had 93% similarity and 88.96% identity with horseradish peroxidase C1A (HRP C1A). In the current study we employed in silico tools to determine, to which group of peroxidase enzymes LDP...
متن کاملAn improved opposition-based Crow Search Algorithm for Data Clustering
Data clustering is an ideal way of working with a huge amount of data and looking for a structure in the dataset. In other words, clustering is the classification of the same data; the similarity among the data in a cluster is maximum and the similarity among the data in the different clusters is minimal. The innovation of this paper is a clustering method based on the Crow Search Algorithm (CS...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017